Search CORE

14 research outputs found

Tropical Geometry of Phylogenetic Tree Space: A Statistical Perspective

Author: Kang Qiwen
Lin Bo
Monod Anthea
Yoshida Ruriko
Publication venue
Publication date: 13/08/2020
Field of study

Phylogenetic trees are the fundamental mathematical representation of evolutionary processes in biology. As data objects, they are characterized by the challenges associated with "big data," as well as the complication that their discrete geometric structure results in a non-Euclidean phylogenetic tree space, which poses computational and statistical limitations. We propose and study a novel framework to study sets of phylogenetic trees based on tropical geometry. In particular, we focus on characterizing our framework for statistical analyses of evolutionary biological processes represented by phylogenetic trees. Our setting exhibits analytic, geometric, and topological properties that are desirable for theoretical studies in probability and statistics, as well as increased computational efficiency over the current state-of-the-art. We demonstrate our approach on seasonal influenza data.Comment: 28 pages, 5 figures, 1 tabl

arXiv.org e-Print Archive

A Quasi-Likelihood Approach to Zero-Inflated Spatial Count Data

Author: Monod Anthea
Publication venue: Lausanne, EPFL
Publication date: 23/07/2012
Field of study

The increased accessibility of data that are geographically referenced and correlated increases the demand for techniques of spatial data analysis. The subset of such data comprised of discrete counts exhibit particular difficulties and the challenges further increase when a large proportion (typically 50% or more) of the counts are zero-valued. Such scenarios arise in many applications in numerous fields of research and it is often desirable to infer on subtleties of the process, despite the lack of substantive information obscuring the underlying stochastic mechanism generating the data. An ecological example provides the impetus for the research in this thesis: when observations for a species are recorded over a spatial region, and many of the counts are zero-valued, are the abundant zeros due to bad luck, or are aspects of the region making it unsuitable for the survival of the species? In the framework of generalized linear models, we first develop a zero-inflated Poisson generalized linear regression model, which explains the variability of the responses given a set of measured covariates, and additionally allows for the distinction of two kinds of zeros: sampling ("bad luck" zeros), and structural (zeros that provide insight into the data-generating process). We then adapt this model to the spatial setting by incorporating dependence within the model via a general, leniently-defined quasi-likelihood strategy, which provides consistent, efficient and asymptotically normal estimators, even under erroneous assumptions of the covariance structure. In addition to this advantage of robustness to dependence misspecification, our quasi-likelihood model overcomes the need for the complete specification of a probability model, thus rendering it very general and relevant to many settings. To complement the developed regression model, we further propose methods for the simulation of zero-inflated spatial stochastic processes. This is done by deconstructing the entire process into a mixed, marked spatial point process: we augment existing algorithms for the simulation of spatial marked point processes to comprise a stochastic mechanism to generate zero-abundant marks (counts) at each location. We propose several such mechanisms, and consider interaction and dependence processes for random locations as well as over a lattice

Infoscience - École polytechnique fédérale de Lausanne

$k$ -Means Clustering for Persistent Homology

Author: Cao Yueqi
Leung Prudence
Monod Anthea
Publication venue
Publication date: 30/07/2023
Field of study

Persistent homology is a methodology central to topological data analysis that extracts and summarizes the topological features within a dataset as a persistence diagram; it has recently gained much popularity from its myriad successful applications to many domains. However, its algebraic construction induces a metric space of persistence diagrams with a highly complex geometry. In this paper, we prove convergence of the

k

-means clustering algorithm on persistence diagram space and establish theoretical properties of the solution to the optimization problem in the Karush--Kuhn--Tucker framework. Additionally, we perform numerical experiments on various representations of persistent homology, including embeddings of persistence diagrams as well as diagrams themselves and their generalizations as persistence measures; we find that clustering performance directly on persistence diagrams and measures outperform their vectorized representations.Comment: 20 pages, 6 figure

arXiv.org e-Print Archive

Fast Topological Signal Identification and Persistent Cohomological Cycle Matching

Author: García-Redondo Inés
Monod Anthea
Song Anna
Publication venue
Publication date: 30/09/2022
Field of study

Within the context of topological data analysis, the problems of identifying topological significance and matching signals across datasets are important and useful inferential tasks in many applications. The limitation of existing solutions to these problems, however, is computational speed. In this paper, we harness the state-of-the-art for persistent homology computation by studying the problem of determining topological prevalence and cycle matching using a cohomological approach, which increases their feasibility and applicability to a wider variety of applications and contexts. We demonstrate this on a wide range of real-life, large-scale, and complex datasets. We extend existing notions of topological prevalence and cycle matching to include general non-Morse filtrations. This provides the most general and flexible state-of-the-art adaptation of topological signal identification and persistent cycle matching, which performs comparisons of orders of ten for thousands of sampled points in a matter of minutes on standard institutional HPC CPU facilities

arXiv.org e-Print Archive

Probability Metrics for Tropical Spaces of Different Dimensions

Author: Cao Yueqi
Drton Mathias
Monod Anthea
Talbut Roan
Tramontano Daniele
Publication venue
Publication date: 06/07/2023
Field of study

The problem of comparing probability distributions is at the heart of many tasks in statistics and machine learning and the most classical comparison methods assume that the distributions occur in spaces of the same dimension. Recently, a new geometric solution has been proposed to address this problem when the measures live in Euclidean spaces of differing dimensions. Here, we study the same problem of comparing probability distributions of different dimensions in the tropical geometric setting, which is becoming increasingly relevant in computations and applications involving complex, geometric data structures. Specifically, we construct a Wasserstein distance between measures on different tropical projective tori - the focal metric spaces in both theory and applications of tropical geometry - via tropical mappings between probability measures. We prove equivalence of the directionality of the maps, whether starting from the lower dimensional space and mapping to the higher dimensional space or vice versa. As an important practical implication, our work provides a framework for comparing probability distributions on the spaces of phylogenetic trees with different leaf sets.Comment: 15 page

arXiv.org e-Print Archive

Topological Data Analysis of Database Representations for Information Retrieval

Author: Cao Yueqi
Kainz Bernhard
Monod Anthea
Schmidtke Luca
Vlontzos Athanasios
Publication venue
Publication date: 04/04/2021
Field of study

Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval. This recently has been achieved by embedding the graphical structure of the database into a manifold so that the hierarchy is preserved. Persistent homology provides a rigorous characterization for the database topology in terms of both its hierarchy and connectivity structure. We compute persistent homology on a variety of datasets and show that some commonly used embeddings fail to preserve the connectivity. Moreover, we show that embeddings which successfully retain the database topology coincide in persistent homology. We introduce the dilation-invariant bottleneck distance to capture this effect, which addresses metric distortion on manifolds. We use it to show that distances between topology-preserving embeddings of databases are small.Comment: 15 pages, 7 figure

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Recommended from our members

Quantitative Analysis of Immune Infiltrates in Primary Melanoma.

Author: Armenta Paul M.
Askin. Kayleigh N.
Blake Zoë
Davari Danielle R.
Drake Charles G.
Esancy Camden L.
Fu Yichun
Gartrell Robyn D.
Hart Thomas D.
Horst Basil
Izaki Daisuke
Jia Dan Tong
Kaufman Howard L.
Li Gen
Lu Yan
Marks Douglas K.
Monod Anthea
Rabadan Raul
Saenger Yvonne M.
Stack Edward C.
Taback Bret
Wu Alan
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

Novel methods to analyze the tumor microenvironment (TME) are urgently needed to stratify melanoma patients for adjuvant immunotherapy. Tumor-infiltrating lymphocyte (TIL) analysis, by conventional pathologic methods, is predictive but is insufficiently precise for clinical application. Quantitative multiplex immunofluorescence (qmIF) allows for evaluation of the TME using multiparameter phenotyping, tissue segmentation, and quantitative spatial analysis (qSA). Given that CD3+CD8+ cytotoxic lymphocytes (CTLs) promote antitumor immunity, whereas CD68+ macrophages impair immunity, we hypothesized that quantification and spatial analysis of macrophages and CTLs would correlate with clinical outcome. We applied qmIF to 104 primary stage II to III melanoma tumors and found that CTLs were closer in proximity to activated (CD68+HLA-DR+) macrophages than nonactivated (CD68+HLA-DR-) macrophages (P < 0.0001). CTLs were further in proximity from proliferating SOX10+ melanoma cells than nonproliferating ones (P < 0.0001). In 64 patients with known cause of death, we found that high CTL and low macrophage density in the stroma (P = 0.0038 and P = 0.0006, respectively) correlated with disease-specific survival (DSS), but the correlation was less significant for CTL and macrophage density in the tumor (P = 0.0147 and P = 0.0426, respectively). DSS correlation was strongest for stromal HLA-DR+ CTLs (P = 0.0005). CTL distance to HLA-DR- macrophages associated with poor DSS (P = 0.0016), whereas distance to Ki67- tumor cells associated inversely with DSS (P = 0.0006). A low CTL/macrophage ratio in the stroma conferred a hazard ratio (HR) of 3.719 for death from melanoma and correlated with shortened overall survival (OS) in the complete 104 patient cohort by Cox analysis (P = 0.009) and merits further development as a biomarker for clinical application

Columbia University Academic Commons

Tropical Foundations for Probability & Statistics on Phylogenetic Tree Space

Author: Lin Bo
Monod Anthea
Yoshida Ruriko
Publication venue
Publication date: 04/06/2018
Field of study

PreprintWe introduce a novel framework for the statistical analysis of phylogenetic trees: Palm tree space is con- structed on principles of tropical algebraic geometry, and represents phylogenetic trees as a point in a space endowed with the tropical metric. We show that palm tree space possesses a variety of properties that allow for the definition of probability measures, and thus expectations, variances, and other fundamental statistical quantities. This provides a new, tropical basis for a statistical treatment of evolutionary biological processes represented by phylogenetic trees. In particular, we show that a geometric approach to phylogenetic tree space — first introduced by Billera, Holmes, and Vogtmann, which we reinterpret in this paper via tropical geometry — results in analytic, geometric, and topological characteristics that are desirable for probability, statistics, and increased computational efficiency

Calhoun, Institutional Archive of the Naval Postgraduate School